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Abstract 

The notion of return probability - explored most famously by George 
Polya on d-dimensional lattices - has potential as a measure for the anal- 
ysis of networks. We present an efficient method for finding return proba- 
bility distributions for connected undirected graphs. We argue that return 
probability has the same discriminatory power as existing fc-step measures 
- in particular, beta centrality (with negative /3), the graph-theoretical 
power index (GPI), and subgraph centrality. We compare the running 
time of our algorithm to beta centrality and subgraph centrality and find 
that it is significantly faster. When return probability is used to measure 
the same phenomena as beta centrality, it runs in linear time - 0(n + m), 
where n and m are the number of nodes and edges, respectively - which 
takes much less time than either the matrix inversion or the sequence 
of matrix multiplications required for calculating the exact or approxi- 
mate forms of beta centrality, respectively. We call this form of return 
probability the Polya power index (PPI). Computing subgraph centrality 
requires an expensive eigendecomposition of the adjacency matrix; return 
probability runs in half the time of the eigendecomposition on a 2000- 
node network. These performance improvements are important because 
computationally efficient measures are necessary in order to analyze large 
networks. 
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1 Introduction 



The probability that a random walk on a graph returns to the node where it 
began - the probability of returning to the origin or simply return probability - 
is a fairly well-known notion in the literature of random walks. Research in this 
area originally concentrated on return probability on infinite regular graphs. In 
his seminal work pQ, Polya proved that a random walk on an infinite 1- or 2- 
dimensional lattice returns to the origin with probability p = 1, but when d > 2, 
p < 1. Methods for determining the value of p for 3-dimensional lattices were 
subsequently discovered [21 El SI [5] . Polya's theorem has also been applied to 
electrical networks by Doyle and Snell [6]. Return probability continues to be 
explored in contemporary research, although the venue has shifted from graphs 
of fixed degree to random graphs [7J [U H] and to spectral methods [TU] • 

There is a class of measures which compute some value for a node i based on 
paths up to length n originating at i. Some representative members of this class 
are degree centrality, beta centrality the graph-theoretical power index 
(GPI) (see p~2] for an overview), and subgraph centrality [13]. These measures 
have been called "n-path centralities" [2]. This term is problematic. First, a 
measure is only a centrality when it satisfies certain requirements, such as those 
proposed in [15]. Beta centrality with negative f3 and the GPI, however, are not 
centralities. "Path" is infelicitous, too, because each measure pays attention 
to different entities. Beta centrality is based on walks, the GPI counts disjoint 
paths, and subgraph centrality is derived from closed walks. We propose to refer 
to them instead as "fc-step measures." 

Return probability is a fc-step measure as well, and it has a few virtues that 
distinguish it from the others: (1) being a probability, it is always in the range 
[0, 1] and requires no normalization, so the return probability of two nodes can 
always be meaningfully compared, even when the nodes are in different networks; 
(2) it permits precise control of the length of walks over which it is computed; 
and (3) it can be computed very efficiently. 

The notation used in this article is mostly conventional. We only consider 
graphs G = (V,E) that are simple, connected, and undirected. Let n = \V\ 
be the number of vertices, and m = \E\ be the number of edges. The length 
of some sequence of adjacent vertices - e.g., a path or a walk - is denoted by 
k. Let A = A(G) be the adjacency matrix of G, where a,j = 1 if there is an 
edge between i and j and a,j = otherwise. Let P = P(G) be the transition 
probability matrix of G, where = 1 /deg(i) if i and j are adjacent, and pij = 
otherwise. We denote the probability of an event X by F(X). We occasionally 
diverge from convention. In Isubscction 2.11 we use AW - with parens that 
distinguish it from the usual A k - to indicate a kind of fc-th power of A that 
is essential for computing return probability. And in Isubscction 3.11 we abuse 
the 3> and <C symbols to compress our visual comparison of the power-related 
fc-step measures. 

Beta centrality and the GPI are measures of exclusionary poweiQ. They 

Alter-based centrality |16| is notably similar to return probability with k = 2. In its 
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identify the relative power of nodes in a network by their ability to exclude 
their neighbors from some valuable interaction. Beta centrality of node i is the 
i-th component of the vector: 

oo 

k=a 

According to Bonacich [IT] , the "sign of /3 corresponds exactly to the distinction 
. . . between positive and negative exchange systems" and its magnitude "affects 
the degree to which distant ties are taken into account" . In this article, we are 
only interested in beta centrality with negative values of j3. The GPI is defined 
as: 

GPh(e) = ^(-l)(fe-Dm ifc 
fe=i 

where g is the diameter of the network, rriik is the number of non-intersecting 
paths of length k originating at node i, and e is the number of exchange op- 
portunities that node i has in any rounco Finally, subgraph centrality [13] - a 
measure of the number of subgraphs in which a node participates - is defined 
as: 

^2, (A k )-- 

fc=0 

Later we discuss in detail other connections among beta centrality, the GPI, 
subgraph centrality, and return probability. For now it suffices to note a few 
characteristics shared just by beta centrality and subgraph centrality, and to 
situate return probability in relation to them. First, beta centrality and sub- 
graph centrality are formally expressed as involving increasing powers of the 
adjacency matrix A. Return probability is expressed in a similar way (although 
in practice we use a stochastic matrix), but each power of A must be mod- 
ified before the subsequent power can be computed. Second, beta centrality 
and subgraph centrality are expressed as infinite sums. Since cumulative return 
probability converges to 1 as the walk length k — >• oo, the return probability 
for each node is the same in the limit, which is not informative. Instead we 
find a distribution of return probabilities over walks of length 1, . . . , k. Finally, 
beta and subgraph centrality are scalar, assigning a single real value to a node. 
In contrast, a distribution of return probabilities is a sequence of real values. 
To reduce a distribution of return probabilities for a node to a scalar value, we 
take either the return probability or the cumulative return probability at some 
chosen k. This allows us to compare return probability to other measures. 

negative mode, it can, like return probability, be used in lieu of beta centrality with negative 
P. 

2 The GPI has undergone many changes since its inception. This equation for the GPI 
appeared early in the tussle of theories competing in the exchange network literature in the 
'80s and '90s. Improved methods, results of which we use later in this paper, focus on the 
probability of a node being excluded in a round of exchanges. 
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The purpose of this article is to show that return probability is equivalent 
to these three /c-step measures, and that it can be computed more efficiently, 
much more so in some cases. The rest of this article is organized as follows. In 
Section [2] we propose and validate a method for finding return probabilities. It 
is based on a particular kind of walk - the self-absorbing walk - which we use to 
model the probability of returning to the origin for the first time. If return 
probability is a useful measure, what does it measure? We devote Section 
[3] to that question, showing that with k = 2, return probability is strongly 
related to existing power measures, implying that return probability is at least 
an approximation of exclusionary power. We call this measure the Polya power 
index (PPI). We also show that return probability with k > 2 is equivalent to 
subgraph centrality. Finally, we show that return probability is significantly 
more efficient to compute than beta centrality and subgraph centrality. Section 
Q] contains further discussion. Section [5] concludes. 

2 Computing Return Probability 

2.1 Algorithm 

Consider a random walk on a graph G — (V,E). Choose some node i £ V as 
the origin and begin to walk. If we return to i, the walk terminates, and we 
start a new walk. To emphasize that in these walks i becomes a terminating 
point only after the walk leaves i, we call this a self- absorbing walk. With such 
walks, returning to the origin at step k is mutually exclusive with returning to 
the origin before step k. Thus the probability of returning to the origin in a 
fc-step walk is related to the following two probabilities: 

1. The probability of returning to i at step k. 

2. The probability of not returning to i at any step < k. 

For the first probability, let NexU t k be the event of returning to i at step 
k on a self-absorbing walk originated from i. To compute W(Nexti t k), we must 
know the states we can potentially be in on a walk of length k — 1. From there 
we must count the number of next steps that are possible from that set of states, 
taking care to distinguish those that return to i from those that do not. Define 
Stepsi^ as the number of possible next steps from the set of possible states after 
a walk of length k — 1 and define ReturnStepSi^ as the number of possible next 
steps that return to i from the same set of possible states. Then F(Next i j : ) is: 



The second of these two probabilities is the complement of the probability of 
returning in any step < k. Let Ri } k denote the event of returning to the origin 




(1) 
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i at step fc. Then the probability of not returning to the origin in fc — 1 steps is: 



¥{R~l A ... A R^l) = P((i? M V ... V R i>k -i)) 

= l-(P(E i ,i)V...VP( J R i , fe _i)) 
= l-(F(R iA ) + ... + F{R i>k _ 1 )) 

k-l 

= 1-£>(JI<, B ) (2) 

x=l 

Combining ([1} and yields our equation for the return probability for any 
node i e V and any length k: 



l-£>(fli, x ) 



¥(Nexti, k ) (3) 



It is well known that an element afj of the fc-th power of A is the number 
of walks of length fc from node i to node j. A non-zero element indicates 
the number of closed walks of length k originating at i. If the diagonal of 
A and its powers are left undefined, the same process computes simple paths 
instead of walks. However, neither technique counts self-absorbing walks. The 
first one fails to terminate a walk once it returns to the origin, causing it to 
be counted more than once; and because the diagonal is all zeros in the second 
one, it disallows returning to the origin altogether. To count self- absorbing 
walks, our computation must permit a walk to return to the origin and must 
terminate a walk once it returns. To accomplish this, we compute P(Ri. k ) by 
taking modified powers of the adjacency matrix A. Define zd(A) as a function 
that sets the diagonal entries of A to 0. Then we compute the modified fc-th 
power of A as: 

A^ = l A n iffc = 1 ' (4) 

\zd(A [k l ')A otherwise 

where A is the original adjacency matrix. Note that we use A^ instead of A k 
to distinguish our modified matrix multiplication from ordinary matrix multi- 
plication. To understand the purpose of setting the diagonal to 0, consider that 
the expression A k A extends the fc-step walks of A k with the 1-step walks of A. 
Setting the diagonal entries of A k to causes walks that return to the origin at 
step fc to terminate at the origin, which satisfies the definition of a self- absorbing 
walk. It is easy to see analogies among the terms of |Equation 3| and |Equation 4| 
- that is, between 1 - Y^Zi HRi,x) and zd(A { - k - 1 ^) on the left, and F(Next i<k ) 
and A on the right. 
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Computing gives us the values of ReturnStepSi^ and Steps^k- Since 
is ReturnStepSi^k and is Stepsi^, we redefine P(iVextj.fc) as follows: 



¥(NexU ik ) 



E« 



7* 

(5) 



o ifE a S ) = 



Thus instead of taking increasing powers of an adjacency matrix, and count- 
ing and dividing at each step, we take increasing powers of a transition proba- 
bility matrix P. Specifically, we design the following procedure for computing 
the distribution of expected return probabilities for each vertex i G V and for 
steps 1, . . . , k: 

1. Initialization 

(a) Initialize a; to 1. 

(b) Initialize k to the number of steps to perform. 

(c) Initialize the transition probability matrix P. 

2. Iteration 

(a) If x — k, terminate. 

(b) Compute P( x ' with |Equation 4| 

(c) Read the values from the diagonal of P^; the value Pff is the 
expected return probability for node i at step x. 

(d) Increase a; by 1. 

The complexity of this algorithm, computed naively, is 0(kn 3 ). It can be 
computed much more efficiently using sparse matrices. Arithmetic operations 
on them are proportional to nnz, the number of non-zero entries. However, as 
k increases, P( fc ) becomes less sparse and the benefits of sparse matrix multipli- 
cation decrease. Let nnzk be the number of non-zero entries in the fc-th matrix 
computed by our algorithm. Then with sparse matrices the time complexity of 
our algorithm is 0(k x nnzk)- We present another optimization in lsubscction 3 . 1 1 
when we discuss return probability as a measure of power. 

The return probability for an entire network and some k can be computed 
by averaging the return probabilities of all nodes: 

P(JJ t ) = -VP(%) (6) 

This network-wide measure can be used in the same fashion as the node- 
specific form of the measure. It can generate a distribution of probabilities, 
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Table 1: High correlation between actual and expected returns in simulations 
with increasing number of walks on two 100-node graphs. 

either step-wise or cumulative. For easy comparison to other measures, it can 
be reduced to a scalar value by taking the step- wise or cumulative return prob- 
ability at a given k. As expected, |Equation 6| reaches its highest value in a 
dyad, where it is 1, regardless of k. Any other network has a network return 
probability < 1. 

2.2 Validation 

To validate that our method correctly computes expected return probabilities, 
we conduct an experiment similar to those in |17j . In this case we release a 
random walker on a 100-node scale-free network and count the number of times 
it returns or fails to return to the origin for walks of particular lengths. 

Since the walks are random, we do not know the length of any given walk 
in advance. Rather, we start walking from a node i and if we return to i in 
k steps, we record this fact and start another walk. When we have completed 
some number of walks using a node i as the origin, we compute the actual return 
rates for i on walks of length k by counting the number of times we returned to 
i on a walk of length k and dividing by the total number of walks. As shown 
in |Figurc l] for two different nodes in a scale-free network, expected return 
probabilities computed by our algorithm match well with the actual return 
rates found after 1000 walks, for different k values. 

In general our experience is that the number of walks required in order 
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Figure 1: Expected and actual return probabilities for walks of increasing length 
from two different nodes in a 100-node scale-free network. Actual return rates 
were averaged over 1000 walks started at each node. 
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to achieve a high correlation between expected and actual returns is small, as 
shown in ITable T] We have also tested with other graphs of varying sizes, which 
exhibit the same general correspondence between expected return probabilities 
and actual return rates. 

3 Return Probability and Other Measures 

If return probability is a meaningful measure, in what sense is it so? We answer 
this question by exploring relationships between return probability and two 
measures of power - beta centrality and the graph-theoretical power index (GPI) 
- and subgraph centrality. We find that return probability resembles a measure 
of exclusionary power when k = 2, which we call the Poly a power index (PPI). 
We also find that return probability is equivalent to subgraph centrality when 
k > 2. It also has asymptotically significant running-time advantages over both 
beta centrality and subgraph centrality. 

3.1 Power Measures 

Beta centrality and the GPI originated in the competing theories of exchange 
networks, thus many of the experiments conducted with them are concerned 
primarily with acts of exchange. However, they - like return probability - 
may be appropriate for identifying powerful nodes in non-exchange networks 
as well. When we refer to these measures, including return probability, as 
measures of power, we mean power in the broadest sense of the term, not just 
limited to exchange networks. Thus, while we rely on results in the exchange 
network literature to illustrate the relationships among return probability, beta 
centrality, and the GPI, we do not think of return probability necessarily as 
a mechanism for generating predictions for the outcomes of network exchange 
experiments. We subscribe to the distinction between "power as a potential and 
power as an activity" and claim only that return probability can identify 

nodes in powerful positions. 

We begin with the results of an early experiment in exchange network the- 
ory [18] , using them to compare return probability and beta centrality only. The 
networks used in this early experiment, shown in |Figurc 2[ were discovered later 
to be strong-power networks [3U] . We then turn to the weak-power networks of 
| Figure 3] and use them to compare return probability, beta centrality, and the 
GPI. 

Beta centrality is motivated in part by the fact that in [18] , classical cen- 
trality measures - degree, betweenness, and closeness - failed to predict the 
outcomes of experiments with negatively-connected exchange networks. In an 
exchange network, actors exchange objects of value. An exchange network is 
connected positively or negatively. Imagine an exchange network consisting of 
three participants A, B, and C; A is connected to B, B is connected to C . If 
the network is positively connected, an act of exchange between A and B docs 
not preclude a concurrent act of exchange between B and C. If the network is 
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Figure 2: Strong-power exchange networks illustrating the Power-Dependence 
Theory (PDT) experiments of Cook et at People with structurally similar 
positions in the network are assigned the same category (E,D,F). People are 
labeled with their category and their 2-step return probability. According to 
PDT, power is distributed in these networks according to the relation E > D = 
F. The 2-step return probabilities agree with the predictions of Cook et al. (See 
also Figure 1 in [T5].) 
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(c) Borg-6 

Figure 3: Weak-power networks. 



negatively connected, B cannot exchange with A and C at the same time. Here 
we are primarily concerned with negatively-connected networks. 

The GPI originated in the network exchange literature as well. It is associ- 
ated with Elementary Theory, a competitor to PDT that has itself fared quite 
well in the experimental literature. There are several versions of the GPI. Here 
we rely on Markovsky's version and results from the Social Networks special 
issue on exchange networks [21] . That version is known to produce contradic- 
tory results under some conditions (see footnote 2 of [12] ) , but to our knowledge 
those conditions do not apply to these particular results. Many of the revisions 
of the GPI that occurred after the formulation of the original GPI - includ- 
ing Markovsky's - make use of a probability of a node being excluded from 
exchange. 

We first show that 2-step return probability does not contradict with some 
PDT predictions. The people in the networks in | Figure 2] are labeled with a 
category determined by a person's position in the network and the person's 
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Table 2: Relative power of nodes in weak-power networks according to return 
probability, beta centrality, and Markovsky's GPI. 



cumulative 2-step return probability. The networks are isomorphic to four ex- 
change networks analyzed in both |18j and . For simplicity of presentation, 
the figures only include the more profitable solid lines from the original net- 
work; the dashed lines are excluded. Following both Cook et al, and Bonacich, 
we compute return probability using only the solid lines. Basing their hypothe- 
ses on PDT, Cook et al predicted that the power distributed among the actors 
in these networks would reach the equilibrium E > D = F. Their predic- 
tion was supported by both a laboratory experiment and computer simulations. 
The fact that 2-step return probability matches the predicted equilibrium for all 
graphs exactly is shown in the labels of |Figurc~2| One obtains the same results 
using beta centrality, with one exception: in the 7-person network, the relation 
remains D > E > F, the same as a conventional centrality. 

It was discovered by Markovsky et al that there are in fact different classes of 
networks |20j . The networks used in the aforementioned experiment are strong- 
power networks - networks in which the relations of exchange are stable and 
the nodes in positions of relative strength dominate their exchange partners. 
There are also equal power networks in which no actor has an advantage. A 
third class - weak-power networks - are structurally somewhere between strong 
and equal power networks. |Figure 3| contains several examples of weak-power 
networks. The discovery of the different classes of networks brought a deeper un- 
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derstanding of the nature of the networks themselves. In strong-power networks, 
for example, the actors are clearly divided between those with high power and 
those with low power; low-power actors are only connected to high-power actors. 
Such networks are bipartite or very close to bipartite. We revisit bipartivity in 
Isubscction 3.21 

I Table 2 1 shows the relative power for all connected nodes in the networks in 
|Figure 3] computed by return probability, beta centrality, and the GPf| If the 
power of a node A exceeds the power of neighbor B, then the value of edge 
AB is listed as >. Here we consider the GPI to be a touchstone. Where beta 
centrality or return probability disagree with the GPI, the symbol is doubled 
and in bold (e.g., <C). This abuses notation somewhat but is readable enough. 
Note that return probability agrees with the GPI for all edges of all networks. 
The only disagreements are between the GPI and beta centrality. In L5-Stem, 
beta centrality identifies C as the most powerful node in the network, whereas 
the GPI and 2-step return probability have B on equal footing with C. In K- 
Stem, the GPI and return probability compute that B is less powerful than D, 
and beta centrality holds the opposite. There is a similar disagreement over 
the edge BC in Borg-6. Generally, it seems that in these particular weak-power 
networks, beta centrality has difficulty identifying the power conferred on a node 
i when it is connected to a node j that has no other exchange opportunities. 
In such a configuration, i is always guaranteed the option of trading with a 
relatively powerless neighbor. 

Since in this case we only need to concern ourselves with k — 2, we can 



reformulate the original algorithm from Isubsection 2.11 to be even more parsi- 

monious. The doesn't reduce the time complexity over our original algorithm in 
a meaningful way, because the number of non-zero entries in a sparse matrix is 
already related to the number of vertices and edges. However, it provides a form 
of the equation that can easily be computed when the network is represented in 
memory as a graph not as a matrix. In honor of George Polya, this is the Polya 
power index (PPI): 



When applying this computation to an entire graph, each edge appears twice 
in a summation, so it runs in time 0(n+m). This is significantly faster than beta 
centrality. Exact implementations of beta centrality require a somewhat costly 
0(n 3 ) matrix inversion. Approximate implementations sum the first k terms 
of beta centrality's infinite series; just like return probability, the approximate 
version of beta centrality can be computed with sparse matrices, which makes 
the matrix multiplications sub-cubic. Even then our algorithm is faster. 

3 To compute beta centrality, we use UCINET 6 for Windows [22] with /3 = -0.2. For GPI, 
we use the values of GPI3 and p whenever GPI3 = 1 in Markovsky's Table 1 1211 . 
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Figure 4: A regular graph in which degree, closeness, betweenness, and eigenvec- 
tor centrality are the same for all nodes, but return probability and subgraph 
centrality vary according to each node's position in the graph. Vertices are 
labeled with the node number, its return probability for k = 5, and its sub- 
graph centrality. Return probability identifies the same sets of nodes ({1, 2, 8}, 
{3, 5, 7}, {4, 6}) as subgraph centrality. 
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Figure 5: Running time for 10-stcp return probability and subgraph centrality 
in small-world networks ranging from 1000 to 2000 nodes. 



3.2 Subgraph Centrality 

Return probability can also be used to identify interesting nodes when k > 2. 
Both [13] and [23] have noted that in some regular graphs, eigenvector centrality 
[24] is equivalent to degree centrality. Both subgraph centrality and return 
probability are able to distinguish nodes from one another in such graphs. The 
label of each node in |Figure 4] shows cumulative return probability for k = 5 
and subgraph centrality. Both measures identify the same groupings of nodes 
and thus have the same discriminatory power. This is no surprise, because both 
measures are expressed as diagonals of powers of some matrix representation of 
a graph. Additionally, both measures count trivial closed walks (i.e. a closed 
walk made from a path starting at node i and the return path to i along the 
same edges). However, return probability counts only unique trivial and non- 
trivial closed walks - which it accomplishes by way of self-absorbing walks - 
whereas subgraph centrality counts all trivial and non-trivial closed walks. 
For small motifs, return probability runs more quickly than subgraph cen- 
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trality regardless of the size of the network, albeit at the cost of greater memory 
consumption due to sparse matrices not being multiplied in place. This can be 
seen in |Figure 5} which shows elapsed time for 10-step return probability and 
subgraph centrality in small- world networks of varying size. The benchmarking 
program is written in the Python programming language and makes use of the 
SciPy library for scientific computing [25] . The eigendecomposition function is 
scipy.linalg.eigh, which in turn uses the robust LAPACK and BLAS linear 
algebra libraries [2(5]. We ran the program on a computer with a 1.8 Ghz AMD 
Athlon 2200 processor and 768 MB RAM. Given that matrix multiplication is 
highly parallel, a distributed MapReduce-style system is an appropriate solution 
to the problem of computing return probability for networks too large to fit into 
the memory of a single computer, 

In a complete graph, both the number of closed walks of length k and the 
factorial of k grow quickly. Ultimately, for sufficiently large k, k\ reduces A k n/k\ 
to 0. However, the rate of growth of A k a also increases rapidly with the order of 
A. In |Figure~6] the value of k at which A k a / k\ reaches its maximum increases as 
the number of nodes increases. Thus, the lengths of the closed walks counted by 
subgraph centrality are sensitive to the size of the network. By contrast, return 
probability allows one to select exactly the lengths of the walks considered by 
the measure. 

Both subgraph centrality and return probability can be used to quantify 
bipartivity degree, a measure of how close a network is to being bipartite [27] . 
Since a bipartite network contains no odd cycles, the number of even cycles 
divided by the number of cycles is 1. When computed for an entire graph, 
subgraph centrality is expressed as [28] : 



n i n 

sc = -ysd = -ye 

i— 1 i— 1 

Which leads to the following equation for a network's subgraph centrality 

SCeven _ EjU COshXj 



13(G) 



sc e"=i^ 



The same can be computed in terms of a network's cumulative return prob- 
ability for walks up to length k: 

_ ¥(R 

kjeven mod 2) 

P(R 3 ) 

Or of a node i's cumulative return probability: 

,k)even 2-^j:(j=0 mod 2) 

Bipartivity degree is typically a value in the range [0.5, 1]. For a bipartite 
network - such as the strong-power networks in |Figure 2| - (3(G) is 1. When 
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(A A k) ii/k! for complete graphs 
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Figure 6: The number of closed walks of length k counted by subgraph centrality 
for complete graphs of different size. 
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Network bipartivity degree at walk length k 
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Figure 7: The network bipartivity degree of non-bipartite weak-power networks 
for walks of increasing length. 
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Node bipartivity degree at walk length k 
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Figure 8: The node bipartivity degree of nodes in a network that consists of a 
complete bipartite network joined by one edge with a complete graph. Note in 
(b) that a node's bipartivity degree remains 1 longer depending on its proximity 
to the border node. 
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even and odd closed walks contribute equally /3(G) is 0.5; the bipartivity degree 
of a complete graph approaches 0.5 as both k and the size of the graph grow. 
| Figure 7| shows that the bipartivity degree of the non-bipartite weak-power net- 
works differs even as k increases. 

Clearly it is possible to use bipartivity degree to distinguish communities 
that tend towards homophily from communities that tend towards heterophily. 
| Figure 8a| shows the node bipartivity degree for all nodes in a graph that consists 
of a complete bipartite graph and a complete graph joined by a single edge e. 
The "border" nodes are the nodes made adjacent by e. Not only are the nodes 
in the two different networks clearly distinguishable from each other, but the 
border nodes show a clear divergence away from the bipartivity degree of the 
other nodes in their cohort and towards each other. Narrowing in on the nodes 



in the bipartite graph, we can see in Figure 8b that the bipartivity degree of 



the nodes farthest from the border node remains 1 for longer than that of the 
nodes in the same set as the border node. 



4 Discussion 

For some measures, the k in "fc-step" is a parameter of the measure itself - for 
example, the k parameter of return probability. When k = 1, return probability 
is the inverse of degree. As k increases, the walks touch a larger neighborhood of 
nodes. Thus if one wishes to compute return probability for a neighborhood of 
a specific size, one simply chooses the appropriate value of k. The j3 parameter 
of beta centrality also suggests the size of the neighborhood around node i that 
is included in the computation. When (3 is 0, beta centrality is akin to degree. 
When the absolute value of (3 is small, only proximal neighbors are considered, 
and the neighborhood grows as \/3\ increases. However, /? itself does not indicate 
the exactl length of walks. However, a cut-off point can be clearly defined when 
computing the approximation of beta centrality, because then one can sum the 
first k terms of the series and terminate. The GPI also has a parameter, e, 
but it is unrelated to the scope of the computation; the GPI is computed from 
one up to the diameter of the network. Subgraph centrality has no walk length 
parameter, but as we describe in lsubscction 3.2l the size of the neighborhood it 
considers varies, being determined only by size of the network and the factorial 
denominator. 

Both beta centrality and subgraph centrality, being expressed as infinite 
sums of open or closed walks, have to cope with the problem of convergence. 
The root of the problem is that walks never terminate. In an undirected graph, 
the number of walks of length k is always greater than the number of walks of 
length fc — 1. They deal with this by weighing walks inversely by length (beta 
centrality) or by factorial of length (subgraph centrality). When computing 
beta centrality or subgraph centrality, the weighting is the same regardless of 
the structure of the network. Return probability also converges, but for dif- 
ferent reasons, and weighting is determined by the underlying structure of the 
network itself. In the star graph <Si jTl , 2-step return probability for the cen- 
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ter node is 1, and for the others it is 1/n. Take £1,4. Since it is bipartite, 
there are no odd- length paths, and the return probabilities for steps 1,...,6 
are 0.0,0.25,0.0,0.1875,0.0,0.1407. |Equation 2] automatically scales step k by 
the complement of step k — 1 so that it is a portion of what remains. Without 
|Equation 2\ they are 0.0,0.25,0.0,0.25,0.0,0.25, so |Equation~l~1 is 0.25 at every 
even-numbered step. This is obvious when you consider that if the walk has not 
returned to the origin, it is always at the center of the star. |Equation 2| scales 
the return probability of step 4 down by the portion of probability already ac- 
counted for by step 2. So if the 2-step return probability for a node i increases 
due to an edge being added to one of i's neighbors, the return probability for i 
at all subsequent steps is reduced in proportion to the increase at step 2. 

Beta centrality and the GPI measure power by having odd-length paths con- 
tribute positively and even-length paths contribute negatively to the value the 
measure computes for a node i. The 1-paths contribute positively to i's power, 
just as a node with higher degree has higher degree centrality. The 2-paths 
detract from i's power because they provide neighbors with the opportunity to 
exchange with some node other than i. Two-step return probability functions 
similarly in that it increases when the degree of i's neighbors decreases. The 
less the opportunity i's neighbors have to exchange with a node other than i, 
the greater is i's 2-step return probability and the greater its power. 

Using "fc-step measure" as the name for the measures discussed in this paper 
makes the category somewhat more general, albeit in name only. The category 
could be made even more general, -ftf-step measures compute some value for a 
node i by considering a sequence of increasingly larger sets of nodes by starting 
with a set containing only i, then adding the neighbors of i, and the neighbors 
of those neighbors, and so on. In mathematical morpholgy, dilation 5(g) is an 
operation on a subgraph g of graph G which adds to g the nodes of G adjacent 
to those of g 29 . A d-dilation is the application of 8 d times: 

S d (g)=S(S(...(g)...)) 

« ' 

d 

The process of constructing the sequence of sets considered by a fc-step measure 
is just a series of k dilations. This process is a constrained form of dilation in so 
far as it always starts with a subgraph of one node. A more general definition 
- one that permits inclusion of a greater number of measures - would have 
two components: (1) a generic process that constructs a sequence of node sets 
by repeatedly dilating an arbitrary subgraph g of G; and (2) an unspecified 
computation which takes as input the sequence of node sets generated by the 
first component. 

It may be that 2-step return probability is a close approximation of struc- 
tural exclusive power because it resembles the model of Markovsky's WeakNet 
simulation software in which actors "(i) seek one exchange per round, (ii) seek 
to exchange with a randomly selected other, and (iii) keep seeking exchange in 
a given round until no more potential exchange partners remain" |21j . If A and 
B are connected, the probability that A seeks to exchange with B and B seeks 
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to exchange with A is just the product of the inverse of the degrees of A and 
B. Similarly, 2-step return probability is the probability that A will be chosen 
as an exchange partner by any randomly-selected neighbor. 

The performance improvements that return probability shows over beta cen- 
trality and subgraph centrality should be understood in the context of practical 
analysis. If one wishes to compute subgraph centrality and one already has 
the eigenvalues and eigenvectors of the adjacency matrix - say, for use by some 
other algorithm - computing return probability would be wasteful. An analyst 
who needs to identify powerful nodes in a somewhat large network may be able 
to get an answer in a reasonable amount of time by using the approximate ver- 
sion of beta centrality algorithm and stopping the computation after a small 
number of walks. Where we imagine return probability being most useful is in 
the analysis of extremely large networks. When a network's nodes number in 
the millions (e.g. online social networks, the World Wide Web), the computa- 
tional efficiency of return probability makes it an attractive alternative to beta 
centrality or subgraph centrality. 

5 Conclusion 

We have presented an algorithm for computing return probability for networks. 
The measure is probabilistic, so it requires no normalization, and it permits 
exact control over the size k of the neighborhood it forms around a node. Be- 
cause it shows agreement with beta centrality and the GPI, the Polya power 
index appears to be a measure of relative power in networks. It is also just 
as capable as subgraph centrality at classifying nodes based on features of a 
network that can only be identified by looking at longer-distance relationships. 
Further, the time complexity of return probability - 0(n + m) for the Polya 
power index and proportional to k x nnzk when k > 2 - is less than either beta 
centrality or subgraph centrality and lends itself to easier analysis of extremely 
large networks. 
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